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Abstract 

This  paper  provides  both  theoretical  and  algorithmic  results  for  the  fi-relaxation 
of  the  Cheeger  cut  problem.  The  £2 -relaxation,  known  as  spectral  clustering,  only 
loosely  relates  to  the  Cheeger  cut;  however,  it  is  convex  and  leads  to  a  simple  op¬ 
timization  problem.  The  -relaxation,  in  contrast,  is  non-convex  but  is  provably 
equivalent  to  the  original  problem.  The  fi-relaxation  therefore  trades  convexity 
for  exactness,  yielding  improved  clustering  results  at  the  cost  of  a  more  challeng¬ 
ing  optimization.  The  first  challenge  is  understanding  convergence  of  algorithms. 
This  paper  provides  the  first  complete  proof  of  convergence  for  algorithms  that 
minimize  the  fi-relaxation.  The  second  challenge  entails  comprehending  the  £1- 
energy  landscape,  i.e.  the  set  of  possible  points  to  which  an  algorithm  might 
converge.  We  show  that  fi-algorithms  can  get  trapped  in  local  minima  that  are 
not  globally  optimal  and  we  provide  a  classification  theorem  to  interpret  these  lo¬ 
cal  minima.  This  classification  gives  meaning  to  these  suboptimal  solutions  and 
helps  to  explain,  in  terms  of  graph  structure,  when  the  -relaxation  provides  the 
solution  of  the  original  Cheeger  cut  problem. 


1  Introduction 


Partitioning  data  points  into  sensible  groups  is  a  fundamental  problem  in  machine  learning.  Given  a 
set  of  data  points  V  =  {xi,  •  •  •  ,  x„}  and  similarity  weights  {wij}i<ij<n,  we  consider  the  balance 
Cheeger  cut  problem  [4] : 

Minimize  LIS')  =  ,,  over  all  subsets  S  CV.  (1) 

minds'!,  jSd) 

Here  jSj  denotes  the  number  of  data  points  in  S  and  S°  is  the  complementary  set  of  S  in  V.  While 
this  problem  is  NP-hard,  it  has  the  following  exact  continuous  -relaxation; 


Minimize  E{f) 


-  fj\ 

x;,  |/*-med(/)| 


over  all  non-constant  functions  /  :  U  — >  M.  (2) 


Here  med(/)  denotes  the  median  of  /  €  M"  and  fi  =  f{xi).  Recently,  various  algorithms  have 
been  proposed  [13,  6,  7,  1,  10,  5]  to  minimize  -relaxations  of  the  Cheeger  cut  (1)  and  of  other 
related  problems.  Typically  these  -algorithms  provide  excellent  unsupervised  clustering  results 
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and  improve  upon  the  standard  £2  (spectral  clustering)  method  [11,  14]  in  terms  of  both  Cheeger 
energy  and  classification  error.  However,  complete  theoretical  guarantees  of  convergence  for  such 
algorithms  do  not  exist.  This  paper  provides  the  first  proofs  of  convergence  for  fi-algorithms  that 
attempt  to  minimize  (2). 

In  this  work  we  consider  two  algorithms  for  minimizing  (2).  We  present  a  new  steepest  descent  (SD) 
algorithm  and  also  consider  a  slight  modification  of  the  inverse  power  method  (IPM)  from  [6].  We 
provide  convergence  results  for  both  algorithms  and  also  analyze  the  energy  landscape.  Specifically, 
we  give  a  complete  classification  of  local  minima.  This  understanding  of  the  energy  landscape 
provides  intuition  for  when  and  how  the  algorithms  get  trapped  in  local  minima.  Our  numerical 
experiments  show  that  the  two  algorithms  perform  equally  well  with  respect  to  the  quality  of  the 
achieved  cut.  Both  algorithms  produce  state  of  the  art  unsupervised  clustering  results.  Finally,  we 
remark  that  the  SD  algorithm  has  a  better  theoretical  guarantee  of  convergence.  This  arises  from 
the  fact  that  the  distance  between  two  successive  iterates  necessarily  converges  to  zero.  In  contrast, 
we  cannot  guarantee  this  holds  for  the  IPM  without  further  assumptions  on  the  energy  landscape. 
The  simpler  mathematical  structure  of  the  SD  algorithm  also  provides  better  control  of  the  energy 
descent. 

Both  algorithms  take  the  form  of  a  fixed  point  iteration  €  A{f^),  where  /  €  A{f)  implies 
that  /  is  a  critical  point.  To  prove  convergence  towards  a  fix  point  typically  requires  three  key 
ingredients:  the  first  is  monotonicity  of  A,  that  is  E{z)  <  E{f)  for  all  2  €  A{f)',  the  second 
is  some  estimate  that  guarantees  the  successive  iterates  remain  in  a  compact  domain  on  which  E 
is  continuous;  lastly,  some  type  of  continuity  of  the  set-valued  map  A  is  required.  For  set  valued 
maps,  closedness  provides  the  correct  notion  of  continuity  [8].  Monotonicity  of  the  IPM  algorithm 
was  proven  in  [6].  This  property  alone  is  not  enough  to  obtain  convergence,  and  the  closedness 
property  proves  the  most  challenging  ingredient  to  establish  for  the  algorithms  we  consider.  Section 
2  elucidates  the  form  these  properties  take  for  the  SD  and  IPM  algorithms.  In  Section  3  we  show 
that  that  if  the  iterates  of  either  algorithm  approach  a  neighborhood  of  a  strict  local  minimum  then 
both  algorithms  will  converge  to  this  minimum.  We  refer  to  this  property  as  local  convergence. 
When  the  energy  is  non-degenerate,  section  4  extends  this  local  convergence  to  global  convergence 
toward  critical  points  for  the  SD  algorithm  by  using  the  additional  structure  afforded  by  the  gradient 
flow.  In  Section  5  we  develop  an  understanding  of  the  energy  landscape  of  the  continuous  relaxation 
problem.  For  non-convex  problems  an  understanding  of  local  minima  is  crucial.  We  therefore 
provide  a  complete  classification  of  the  local  minima  of  (2)  in  terms  of  the  combinatorial  local 
minima  of  (1)  by  means  of  an  explicit  formula.  As  a  consequence  of  this  formula,  the  problem 
of  finding  local  minima  of  the  combinatorial  problem  is  equivalent  to  finding  local  minima  of  the 
continuous  relaxation.  The  last  section  is  devoted  to  numerical  experiments. 

We  now  present  the  SD  algorithm.  Rewrite  the  Cheeger  functional  (2)  as  E{f)  =  T{f) / B{f), 
where  the  numerator  T(/)  is  the  total  variation  term  and  the  denominator  B{f)  is  the  balance  term. 
If  T  and  B  were  differentiable,  a  mixed  explicit-implicit  gradient  flow  of  the  energy  would  take  the 
form  — =  — (VT(/^+^)  — £^(/*)Vi?(/^))/(i?(/^)),  where  {r*}  denotes  asequence 
of  time  steps.  As  T  and  B  are  not  differentiable,  particularly  at  the  binary  solutions  of  paramount 
interest,  we  must  consider  instead  their  subgradients 

dT{f)  :=  {u  e  M"  :  Tig)  -  T(/)  >{v,g-  f)  Vg  e  M"}  ,  (3) 

d„Bif)  :=  {u  e  M"  :  Big)  -  Bif)  >{v,g-  f)  Vg  €  M"  and  (1,  u)  =  0}  .  (4) 

Here  1  e  M"  denotes  the  constant  vector  of  ones.  Also  note  that  if  /  has  zero  median  then  Bif)  = 

ll/lli  and  doBif)  =  {v  €  sign(/),s.t.  mean(w)  =  0}.  After  an  appropriate  choice  of  time  steps 

we  arrive  to  the  SD  Algorithm  summarized  in  table  l(on  left),  i.e.  a  non-smooth  variation  of  steepest 
descent.  A  key  property  of  the  the  SD  algorithm’s  iterates  is  that  >  0.  This  property 

allows  us  to  conclude  global  convergence  of  the  SD  algorithm  in  cases  where  we  can  not  conclude 
convergence  for  the  IPM  algorithm.  We  also  summarize  the  IPM  algorithm  from  [6]  in  Table  1  (on 
right).  Compared  to  the  original  algorithm  from  [6],  we  have  added  the  extra  step  to  project  onto 
the  sphere  5"“^,  that  is  =  /i^/||ft.^||2-  While  we  do  not  think  that  this  extra  step  is  essential, 
it  simplifies  the  proof  of  convergence. 

The  successive  iterates  of  both  algorithms  belong  to  the  space 

5o"-1:={/gM":||/||2  =  1  and  med(/)  =  0}.  (5) 
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Table  1 :  :  SD  Algorithm. 

/°  nonzero  function  with  med(/)  =  0. 
c  positive  constant, 
while  E{f'^)  -  E{f’^+^)  >  TOL  do 
e  doB{f>^) 
g^  =  f+cv'^ 

=  argmin  T{u)+^^^^^\\u—g^\\2 
ueK" 

-med(h'=)l 

fk+l  _  h’’ 

J  -  ||ft'=||2 

end  while 


^IPM  :  Modifed  IPM  Algorithm  [6]. 

/°  nonzero  function  with  med(/)  =  0. 
while  E{f)  -  E{f+^)  >  TOL  do 
v'^  G  aoB(/'=) 

=  min||„||2<i  T('u)  -  E{f^){u,v'^) 
g^  =  arg  min  T{u)  -  E{f)  {u,  v^)  if  D'=<  0 
II«I|2<1 

gk  ^  Jk  if  £)fc  =  0 
h>^  =gk  -  med(p'=)l 
rfc+l  _  /i'' 

J  -  ||/l'=||2 

end  while 


As  the  successive  iterates  have  zero  median,  d[)B{f^)  is  never  empty.  For  example,  we  can  take 
G  M”  so  that  v^{xi)  =  1  if  /(x^)  >  0,  =  —1  if  /(x^)  <  0  and  v^{xi)  =  (n“  —  n+)/(no) 

if  f{xi)  =  0  where  n+,  n~  and  denote  the  cardinalities  of  the  sets  {xi  :  f{xi)  >  0},  {x^  : 
f{xi)  >  0}  and  {xi  :  f{xi)  =  0},  respectively.  Other  possible  choices  also  exist,  so  that  is 
not  uniquely  defined.  This  idea,  i.e.  choosing  an  element  from  the  subdifferential  with  mean  zero, 
was  introduced  in  [6]  and  proves  indispensable  when  dealing  with  median  zero  functions.  As  is 
not  uniquely  defined  in  either  algorithm,  we  must  introduce  the  concepts  of  a  set-valued  map  and  a 
closed  map,  which  is  the  proper  notion  of  continuity  in  this  context: 

Definiton  1  (Set-valued  Map,  Closed  Maps).  Let  X  and  Y  be  two  subsets  o/M”.  If  for  each  x  G  X 
there  is  a  corresponding  set  E{x)  C  Y  then  F  is  called  a  set-valued  map  from  X  to  Y  .  We  denote 
this  by  F  :  X  ^  Y.  The  graph  of  F,  denoted  Graph(F)  is  defined  by 

Graph{F)  =  {{x,y)  G  K"  x  M"  :  x  G  X,  y  G  F{x)}. 

A  set-valued  map  F  is  called  closed  if  Graph{F)  is  a  closed  subset  of  ME  x  M".  In  other  words,  if 
G  F{x^),  X*  — >■  X*  and  — >■  y*  then  x*  G  F{y*). 

With  these  notations  in  hand  we  can  write  G  .4sd(/^)  (SD  algorithm)  and  f^'^^  G  ^ipm(/*) 
(IPM  algorithm)  where  .4sd,.4ipm  :  ^  '^he  appropriate  set-valued  maps.  The 

notion  of  a  closed  map  proves  useful  when  analyzing  the  step  G  (/^)  in  the  SD  algorithm. 
Particularly, 

Lemma  1  (Closedness  of  'H(/)).  The  set-valued  map  Ti  :  5q  =|  M" 

%{f)  :=  argmin|r(M)  -f  {f  +  <^oB{f))\\l^ 

is  closed. 

Currently,  we  can  only  show  that  lemma  1  holds  at  strict  local  minima  for  the  analogous  step,  y^, 
of  the  IPM  algorithm.  That  lemma  1  holds  without  this  further  restriction  on  /  G  Sq~^  will  allow 
us  to  demonstrate  stronger  global  convergence  results  for  the  SD  algorithm.  We  pause  briefly  to 
state  closedness  of  the  set-valued  map  doB{f)  :  Sq~^  =1  M",  as  we  need  this  result  in  many  of  the 
proofs  that  follow. 

Lemma  2  (Closedness  of  doB{f  )).  The  set-valued  map  OqB  :  5g  =1  M" 

d^B{f)  :=  {x  G  M"  :  B{g)  -  B{f)  >{v,g-  f)  Vy  G  M"  and  (1,  x)  =  0} 

is  closed. 

Proof.  See  appendix  A.  □ 

2  Properties  of  x4sd  and  x4ipm 

This  section  establishes  the  required  properties  of  the  of  the  set-valued  maps  .4sd  and  .4ipm  men¬ 
tioned  in  the  introduction.  In  section  2.1  we  first  elucidate  the  monotonicity  and  compactness  of 
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^SD  and  .4ipm-  Section  2.2  demonstrates  that  a  local  notion  of  closedness  holds  for  each  algorithm. 
This  form  of  closedness  suffices  to  show  local  convergence  toward  isolated  local  minima  (c.f.  Sec¬ 
tion  3).  In  particular,  this  more  difficult  and  technical  section  is  necessary  as  monotonicity  alone 
does  not  guarantee  this  type  of  convergence. 


2.1  Monotonicity  and  Compactness 


We  provide  the  monotonicity  and  compactness  results  for  each  algorithm  in  turn.  Lemmas  3  and  4 
establish  monotonicity  and  compactness  for  .4sd  while  Lemmas  5  and  6  establish  monotonicity  and 
compactness  for  .4ipm- 

Lemma  3  (Monotonicity  of  .Tsd)-  f  €  define  v,g,  h  and  h  according  to  the  SD 

algorithm.  Then  neither  h  nor  h  is  a  constant  vector.  Moreover,  the  energy  inequality 


E{f)  >  E{h)  + 


Ejf)  life -/Hi 

B{h)  c 


holds.  Ai  a  consequence,  if  z  G  .4sd(/)  then  E{z)  =  E{h)  <  E{f)  unless  z  =  f. 


(6) 


Proof.  The  definition  of  h  implies  that  E{f  )  ^  —dTfh).  The  definition  of  dT,  the  invari¬ 

ance  of  T  under  addition  of  a  constant  and  the  fact  that  (u,  1)  =  0  combine  to  imply 

T{f)  >  T{h)  +  ^{g-hj-h)  =  T{h)  +  ^\\f-h\\l-  E{f){v,  h-f)  (7) 


=  T{h)P^\\f-h\\l-E{f){v,h-f). 


(8) 


As  also  V  G  dQB{f)  we  have  E{f)B{h)  >  E{f)B{f)  +  E{f){v,  h  —  /).  Adding  these  two  last 
inequalities  yields 


T{f)  +  E{f)B{h)>T{h)  +  E{f)B{f) 


E{f) 


\\h-f\\l 


In  other  words. 


E{f)B{h)>T{h)  +  ^\\h-f\\l 


Note  that  E{f)  >  0  as  /  €  5g  Therefore,  if  h  were  constant,  then  B{h)  =  0  and  h  =  h  =  f . 
This  is  a  contradiction  since  /  G  Sq~^  and  is  thus  not  constant.  Consequently  B{h)  >  0,  so  we 
may  divide  in  the  last  expression  to  obtain  (6).  The  last  statement  then  follows  as  E  is  invariant 
under  scalings.  □ 

Lemma  4  (Compactness  of  AIsd)-  Let  f^  G  and  define  a  sequence  of  iterates 

{g^ ,  according  to  the  SD  algorithm.  Then  for  any  such  sequence 

||^"I!2  <  II/II2,  1  <||/||2<1 +  cVrt  and  0  <  ||h'=||2  <  (1  +  ^/r^)||h'=||2.  (9) 

Moreover,  we  have 

||/,fe-/fe||2^0,  med(h'=)^0,  ||/'=-/'=+i2^0.  (10) 

Therefore  S'f~^  attracts  the  sequences  {h^}  and  {h^}. 


Proof.  To  prove  that  \\h\\2  <  \\g\\2,  note 

h  =  prox,j((7)  :=  argmin  / $(u)  + 


where  $(u)  = 


E{f) 


T{u). 


As  proximal  mappings  are  Lipshitz  continuous  with  constant  one  and  prox^(O)  =  0,  we  have 

||h||2  =  ||prox<(,(5)  -prox<(,(0)||2  <  ||p||2-  (11) 
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As  B{  f)  is  one-homogeneous,  (/,  v)  =  B{f)  >  0,  so  that  directly  computing  |  |p|  ^  directly  shows 

\\g\\l  =  l  +  2c{f,v)+c^\\v\\l>l. 

The  inequality  ||(/||2  <  1  +  c^/n  follows  from  the  fact  that  ||u||2  <  v^||f||oo  <  the 

triangle  inequality.  The  bound  0  <  \\h\\2  follows  since  h  is  not  constant,  and  the  upper  bound 
ll^lb  <  (1  +  •\At)ll^ll2  again  follows  from  the  triangle  inequality. 

For  the  second  statement,  as  €  5g  it  follows  that  E{f^)  >  a  >  0.  From  (6),  then, 

fhk  _  fc  fc  _  (12) 

a 

From  (9)  we  have  <  -ynll^^lb  <  (1  +  •\/n)(v^  +  and  therefore 

11^"  -  fwl  <  -(1  +  v^)(v^  +  nc){E{f)  -  E{f+^))  ^  0. 

a 

The  last  line  follows  as  is  decreasing  and  bounded  from  below,  and  therefore  converges.  By 

continuity  of  the  median  and  the  fact  that  med(/^)  =  0,  any  limit  point  of  the  {/*}  must  have 
median  zero.  As  |j/i^  —  >  0,  any  limit  point  of  the  {h^}  must  also  have  median  zero,  which 

implies  that  med{h^)  — >  0  as  well.  The  triangle  inequality  then  implies  \  \h^  —  /^||  — )■  0,  so  that 
\\h’^\\  — >  1  and  — >  0  as  desired.  □ 

By  the  monotonicity  result  of  Hein  and  Biihler  [6]  we  have 

Lemma  5  (Monotonicity  of  .4ipm)-  Let  f  G  If  z  &  -diPM(/)  then  E{z)  <  E{f  )  unless 

z  =  f. 

To  prove  convergence  for  .4ipm  using  our  techniques,  we  must  also  maintain  control  over  the  iterates 
after  subtracting  the  median.  This  control  is  provided  by  the  following  lemma. 

Lemma  6  (Compactness  of  Aipm)-  Let  f  G  5q  and  define  v,  D,  g  and  h  according  to  the  IPM. 

1.  The  minimizer  is  unique  when  D  <  0,  i.e.  g  G  5”“^  is  a  single  point. 

2.  1  <  ||/i||2  <  1  +  particular,  .4ipm(/)  A  always  well-defined  for  a  given  choice  of 

V  G  doBlf). 


Proof 

(1.)  Let  D  <  0,  and  suppose  there  existed  two  distinct  minimizers  gi  and  g2  that  lie  on  the  boundary 
of  the  unit  ball.  For  any  0  <  0  <  1  define  gg  =  6gi  +  (1  —  0)g2  and  note  that  ||(7e||2  <  1.  By 
convexity  of  T  and  linearity  of  the  inner  product. 

Tigs)  -  E{f){gg,  doBif))  <  079  +  (1  -  0)79  =  79. 

By  one-homogeneity  of  T  and  the  inner-product,  and  the  fact  that  79  is  the  global  minimum  it  follows 
that 


WgelUD  <  ||5e||2 


T 


(  99 


\\\99\\2 

This  cannot  happen  as  79  <  0  and  ||0e||2  <  1- 


-E{f) 


99 


,doB{f) 


<  79. 


(2.)  If  79  =  0  then  the  statement  holds  trivially.  Otherwise  79  <  0,  so  that  if  |  |li|  I2  <  1  then 


\\h\\2D<\\h\\2 


E{f) 


Tih)-Eif){h,doB{f))=D. 


The  last  inequality  follows  since,  due  to  the  choice  of  subdifferential  doB,  we  may  add  a  constant 
to  the  global  minimizer  g  without  changing  the  value  of  the  expression.  If  |  |li|  I2  <  1  we  therefore 
arrive  at  a  contradiction.  From  the  triangle  inequality  it  follows  that  also  II/1II2  <  1  +  s/n. 
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(3.)  The  one-homogeneity  of  B  and  the  definition  of  the  subgradient  combine  to  show 
(h,  doB{f))  <  B(h).  When  D  <  0  we  have 

T{h)  -  Eif){h,  doBif))  =  T{g)  -  E{f){g,  d^B{f))  <  0 

so  that  T(h)  <  E{f)B{h).  As  ||h||2  >  1  and  med(ft.)  =  0  we  know  h  is  non-constant,  so  we  can 
divide  by  B{h)  to  obtain  E{S{f))  =  E{h)  <  E{f)  as  desired.  If  Z?  =  0  then  -4ipm(/)  =  f,  so  the 
claim  follows.  □ 

2.2  Closedness  Properties 

The  final  ingredient  to  prove  local  convergence  is  some  form  of  closedness.  We  require  closedness 
of  the  set  valued  maps  A  at  strict  local  minima  of  the  energy.  As  the  energy  (2)  is  invariant  under 
constant  shifts  and  scalings,  the  usual  notion  of  a  strict  local  minimum  on  M"  does  not  apply.  We 
must  therefore  remove  the  effects  of  these  invariances  when  referring  to  a  local  minimum  as  strict. 
To  this  end,  define  the  spherical  and  annular  neighborhoods  on  5q  by 

BeiD  :=  {\\f  -r\\2<e}n  AsAD  ■■=  {S  <  \\f  -  TA  <  e}  n  s^-\ 

With  these  in  hand  we  introduce  the  proper  definition  of  a  strict  local  minimum. 

Definiton  2  (Strict  Local  Minima).  Let  f°°  G  Sq~^-  We  say  f°°  is  a  strict  local  minimum  of  the 
energy  if  there  exists  e  >  0  so  that  f  G  Be{f°°)  and  f  f  f°°  imply  E{f)  >  E{f°°). 

This  definition  then  allows  us  to  formally  define  closedness  at  a  strict  local  minimum  in  Definition 
3.  For  the  IPM  algorithm  this  is  the  only  form  of  closedness  we  are  able  to  establish.  Closedness  at 
an  arbitrary  /  G  (c.f.  lemma  1)  does  in  fact  hold  for  the  SD  algorithm.  Once  again,  this  fact 

manifests  itself  in  the  stronger  global  convergence  results  for  the  SD  algorithm  in  section  4. 

Definiton  3  (CLM/CSLM  Mappings).  Let  A{f)  :  Sq~^  =i  Sq~^  denote  a  set-valued  mapping.  We 
say  A{f)  is  closed  at  local  minima  (CLM)  if  G  A(f^)  and  f^  -G  imply  -G  whenever 
f°°  is  a  local  minimum  of  the  energy.  If  z^  —t  f°°  holds  only  when  f°°  is  a  strict  local  minimum 
then  we  say  A{f)  is  closed  at  strict  local  minima  ( CSLM). 

The  CLM  property  for  the  SD  algorithm,  provided  by  lemma  7,  follows  as  a  straight  forward  conse¬ 
quence  of  lemma  1.  The  CSLM  property  for  the  IPM  algorithm  provided  by  lemma  8  requires  the 
additional  hypothesis  that  the  local  minimum  is  strict. 

Lemma  7  (CLM  Property  for  .4sd)-  For  f  G  Sq~^  define  g,  h  and  h  according  to  the  SD  algorithm. 
Then  .4sd(./)  defines  a  CLM  mapping. 

Proof.  Given  — )•  f°°  and  z'^  G  Aif'^),  let  A  G  1-L{A)  be  such  that  A  =  hf  —  and 

=  A\\A\\A ■  As  {hf}  lies  in  a  compact  set,  any  subsequence  of  {hf}  has  a  further  convergent 
subsequence  ff'  -G  h°° .  As  /**  — >•  and  Ti  is  closed,  h°°  G  7f(/°°).  Thus,  there  exists 

v°°  G  doB{f°°)  so  that 

=  arg  min  |  T{u)  +  1 1^  -  -  ct;“  1 1 A  . 

u  2c  J 

Note  this  happens  if  and  only  if  0  €  dT{h°°)  -f  —  f°°  —  cv°°).  From  the  fact  that 

dT{f°°)  —  E{f°°)v°°  =  dT{f°°)  —  f°°  —  cv°°)  and  the  uniqueness  of  minimizers, 

it  follows  that  h°°  =  f°°  provided  0  G  dT{f°°)  —  E{f°°)v°°.  In  this  case,  both  and  must 
then  converge  to  f°°  as  well.  As  any  subsequence  of  {z^}  has  a  further  subsequence  that  converges 
to  f°° ,  in  fact  the  whole  sequence  converges  to  f°°  as  desired. 

It  remains  only  to  show  that  0  G  dT{f°°)  —  E{f°°)v°°  whenever  G  doB{  f°°)  and  is  a  local 
minimum  of  the  energy.  Take  e  >  0  so  that  /  G  Be{f°°)  implies  E{f)  >  E{f°°),  and  suppose  that 
E{f°°)v°°  dT{f°°).  By  definition,  there  then  exists  a  p  G  M"  so  that 

T{g)  -  nr)  <  E{r){v°°,g  -  d  =  Eir){v°°,9)  -  nr)- 
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For  0  <  6*  <  1  set  (/e  :=  (1  —  9)f°°  +  6g,  then  compute 

Tig)  <  Eir){v°°,g)  =  ^Eir){v°°,ge  -  (1  -  0)/“) 

Tige)  <  (1  -  e)Tir)  +  9Tig)  <  E{r){v^,ge) 

by  using  the  fact  that  =  {v°°,  f°°)  and  the  fact  that  T  is  convex.  By  definition  of  doB{f°°) 

it  follows  that  {v°° ,gg)  <  Bigg),  which  yields 

Tigg)  <  EiDBigg) 

whenever  0  <  0  <  1.  This  implies  Bigg)  >  0,  so  that  Eigg)  <  Eif°°)  for  all  0  <  0  <  1  as 
well.  Put  gg  =  gg  —  med(pe)l  and  note  that  gg  — )■  f°°  as  9  —>■  0  since  f°°  has  zero  median.  But 
Ei\\90\\2^9e)  =  Eigg)  =  Eigg)  <  Eif°°)  and  \\gg\\2^90  €  for  all  9  sufficiently  close  to 

zero,  which  contradicts  the  assumption  that  f°°  is  a  local  minimizer.  □ 

Lemma  8  (CSLM  Property  for  .4ipm)-  For  /  G  Sq~^  define  v,  D,  g,  h  according  to  the  IPM.  Then 
Aipuif)  defines  a  CSLM  mapping. 

Proof.  Consider  a  sequence  of  points  €  Be  with  — >  f°°.  Let  =  5'(/^)  and  also  let 

denote  the  intermediate  steps  in  the  algorithm  above.  We  will  show  any  subsequence  of 
{z^}  has  a  further  subsequence  that  converges  to  f°°. 

Define 

1C  :=  {k  gN  :  D'^  =0}  , 

and  consider  an  arbitrary  subsequence  of  the  z^.  If  the  subsequence  has  only  finitely  elements  in 
then  z^  =  /*  for  all  but  finitely  many  elements  of  the  subsequence.  Since  then  z^  =  /*  for  all  but 
finitely  many  k  and  by  hypothesis,  the  whole  subsequence  converges  to  f°° .  Otherwise, 

an  infinite  number  of  terms  lie  in  JC^.  By  restricting  to  only  those  elements  of  the  subsequence  that 
lie  in  and  by  extracting  enough  convergent  subsequences  of  (/*,  g^ ,  ,  z*)  we  may  assume 

that 

f  ^  ^  9*,  h>^^h*=9*-  med(5*)l,  z'^  ^  z*  = 

Since  the  subdifferential  doBif^)  is  closed,  we  may  assume  (by  extracting  yet  another  subsequence) 
that  doBif’^)  -Gv*  G  9o-B(/“).  Define 

||«||2<1 

and  assume  for  the  sake  of  contradiction  that 

D*  <  Tig*)- EiDigfv*), 

i.e.  that  g*  does  not  attain  the  minimum.  If  this  were  the  case,  then  there  exists  a  q*  with  1 1(7*  1 12  <  1 
with  the  property  that 

Tiq*)  -  Eir)iq*,v*)  <  Tig*)  -  Eir){g*,v*). 

But  as 

Tiq*)-Eir)iq*,v*)=  lim  Tiq*)  -  Eif)iq* ,  doBif)) 

k—¥(X) 

Tig*)-Eir)ig*,v*)=  lim  Tig'^)  -  Eif)ig\  doBif)) 

k—¥(X) 

we  see  that  Tiq*)  -  Eif>^)iq* ,  doBif'^))  <  r(g'=)  -  Ei  f)ig^,dQBi  f))  for  all  k  sufficiently 
large,  which  contradicts  the  definition  of  g^  as  the  global  minimizer. 

Suppose  now  that  z*  f°°,  and  recall  that  D*  <  0.  Then  from  the  preceeding  argument,  we  know 

D*  =Tig*)-Eir)ig*,v*)<0. 

From  the  fact  that  (v*,  1)  =  0  we  have 

Tih*)  -  EiDih* ,v*)  =Tig*)  -  EiDig* ,v*)  <Q. 
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By  using  one-homogeneity  of  T  it  then  follows  that 


P*  :=  T{z*)  -  E{f°°){z*,v*)  <  0 
as  well.  Define  zg  =  9z*  -f  (1  —  d)f°°  and  also 

Zg  :=  zg  -  med(ze)l. 

Again  as  {v*,  1)  =  0,  the  convexity  of  T  and  linearity  of  the  inner  product  imply 
Tizg)  -  E{D{ze,v*)  =  T{ze)  -  E{r){zg,  v*)  <  OP*  <  0. 
The  fact  that  B{zg)  >  {zg,v*)  then  implies  the  inequality 

E{zg)  <  EiD 


holds  for  all  0  <  0  <  1.  As  med(ie)  — >  0  as  0  — >  0,  by  the  reverse  triangle  inequality  it  follows 
that  for  all  9  small 

\\zg\\2  >1-29-  T[\ed{zg)y/n  >  1/4. 

From  scale  invariance  of  the  energy,  for  all  such  9  we  have  that  both 


<  E{r) 


and 


Zg 

Z9\\2 


r 


<  ^\\Z0 

2 


Izghrh 


0 


hold  as  d  — )■  0.  Since  zg /\\ze\\2  &  this  contradicts  the  fact  that  f°°  is  a  strict  local  minimium 

in  Thus,  we  must  have  z*  =  /“.  Therefore  any  subsequence  of  {z*}  has  a  further  subsequence 
that  converges  to  f°° .  This  implies  that  in  fact  the  whole  sequence  z^  converges  to  f°°  as  desired. 

□ 


3  Local  Convergence  of  ^sd  and  ^ipm  at  Strict  Local  Minima 

Due  to  the  lack  of  convexity  of  the  energy  (2)  ,  at  best  we  can  only  hope  to  obtain  convergence 
to  a  local  minimum  of  the  energy.  An  analogue  of  Lyapunov’s  method  from  differential  equations 
allows  us  to  show  that  such  convergence  does  occur  provided  the  iterates  reach  a  neighborhood  of 
an  isolated  local  minimum.  To  apply  the  lemmas  from  section  2  we  must  assume  that  f°°  G  Sq~^ 
is  a  local  minimum  of  the  energy.  We  will  assume  further  that  f°°  is  an  isolated  critical  point  of  the 
energy  according  to  the  following  definition. 

Definiton  4  (Isolated  Critical  Points).  Let  f  G  Bq~^.  We  say  that  f  is  a  critical  point  of  the  energy 
E{f)  if  there  exist  w  G  dT{f)  and  v  G  doB{f)  so  that 

0  =  w  —  E{f)v. 


This  generalizes  the  usual  quotient  rule 

Q  =  VT{f)-E{f)VB{f). 

If  there  exists  e  >  0  so  that  f  is  the  only  critical  point  in  B^{f°°)  we  say  f  is  an  isolated  critical 
point  of  the  energy. 

Note  that  as  any  local  minimum  is  a  critical  point  of  the  energy,  if  f°°  is  an  isolated  critical  point 
and  a  local  minimum  then  it  is  necessarily  a  strict  local  minimum.  The  CSLM  property  therefore 
applies. 

Finally,  to  show  convergence,  the  set-valued  map  A  must  possess  one  further  property,  i.e.  the 
critical  point  property. 

Definiton  5  (Critical  Point  Property).  Let  A{f)  :  Sq~^  Sq~^  denote  a  set-valued  mapping.  We 
say  that  A{f)  satisfies  the  critical  point  property  (CP  property)  if  given  any  sequence  satisfying 
f^'^^  G  A(f^),  all  limit  points  o/{/^}  are  critical  points  of  the  energy. 


Analogously  to  the  CLM  property,  for  the  SD  algorithm  the  CP  property  follows  as  a  direct  conse¬ 
quence  of  lemma  1 .  For  the  proof,  see  the  first  statement  in  theorem  2.  We  establish  this  for  the  IPM 
algorithm  in  the  following  lemma. 

Lemma  9  (CP  Property  for  the  IPM  Algorithm).  The  set-valued  mapping  Aipuif)  '■  =4 

satisfies  the  critical  point  property. 

Proof.  Let  —>■/*€  Sq~^  denote  a  convergent  subsequence.  Define  and 

according  to  the  IPM  algorithm.  By  compactness,  we  can  extract  enough  further  subsequences  (still 
denoted  )  to  find 

The  fact  that  v*  G  doB{f*)  follows  from  the  closedness  established  in  lemma  2.  Define 

D*  :=  inin  r(u) -£;(/*)(«,  u*). 

||«||2<1 

As  in  the  proof  of  the  CSLM  property  we  know  g*  must  attain  the  minimum,  i.e.  D*  =  T{g*)  — 
E{f*){g* ,v*).  Suppose  that  D*  <  0.  Then  as 

D*=  lim 

there  exists  J  sufficiently  large  so  that  j  >  J  implies 

<  0. 

But  this  implies 

^  <  E{f*) 

for  all  j  sufficiently  large,  a  contradiction.  Thus  D*  =  0  and  f*  must  be  the  minimizer  of 

min  T{u)  -  E{f*){u,v*). 

This  implies  0  €  dT{f*)  —  E{f*)v*  so  f*  is  a  critical  point  as  desired.  □ 

The  proof  of  local  convergence  utilizes  a  version  of  Lyapunov’s  direct  method  for  set- valued  maps, 
and  we  adapt  this  technique  from  the  strategy  outlined  in  [8].  We  first  demonstrate  that  if  any 
iterate  /*  lies  in  a  sufficiently  small  neighborhood  of  the  strict  local  minimum  then  all 

subsequent  iterates  remain  in  the  neighborhood  in  which  f°°  is  an  isolated  critical  point. 

By  compactness  and  the  CP  property,  any  subsequence  of  {/*}  must  have  a  further  subsequence 
that  converges  to  the  only  critical  point  in  B^{f°°),  i.e.  f°°.  This  implies  that  the  whole  sequence 
must  converge  to  f°°  as  well.  We  formalize  this  argument  in  lemma  10  and  its  corollary  theorem  1. 
Lemma  10  (Lyapunov  Stability  at  Strict  Local  Minima).  Suppose  A{f)  is  a  monotonic,  CSLM 
mapping.  Fix  G  and  let  {/^}  denote  any  sequence  satisfying  G  A{f^).  If  f°°  is  a 

strict  local  minimum  of  the  energy,  then  for  any  e  >  0  there  exists  a  7  >  0  so  that  if  f^  G  B-f{f°°) 
then  {f^}  C  B,{D. 

Proof.  The  proof  follows  [8].  By  taking  e  smaller  if  necessary,  we  can  assume  that  f°°  is  a  strict 
local  minimum  on  B^{f°°).  From  the  CSLM  property,  we  can  choose  0  <  5  <  e  small  enough  to 
guarantee 

f  G  Bs  implies  A(f)  C  Be. 

For  such  a  choice  of  6,  define 

fj,  :=  min  E{f)  -  E{foo)  >  0. 

By  continuity  of  E  on  we  can  then  choose  0  <  7  <  <5  small  enough  so  that  f  G  Bj 

implies  E{f)  —  E{fao)  <  7.  Take  any  initial  point  G  B-^.  Let  K  be  the  first  integer  so  that 
11/^  —  /00II2  >  S.  By  assumption,  since  f^~^  G  Bs  we  must  have  f^  G  As^eif^)-  But  then 

E{f^)-E{f^)>Pi 

by  definition  as  well.  However,  since  E  always  decreases  we  must  have 

I  >  E{f)  -  EiD  >  E{f^)  -  E{fe^)  >  p, 

which  is  a  contradiction.  Thus,  the  whole  sequence  {/^}  C  Bs  C  Be.  □ 
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Theorem  1  (Local  Convergence  at  Isolated  Critical  Points).  Let  A{f)  :  Sq~^  5q denote  a 
monotonic,  CSLM,  CPP  mapping.  Let  G  Sq~^  and  suppose  {/^}  is  any  sequence  satisfying 
G  A{f^).  Let  f°°  denote  a  local  minimum  that  is  an  isolated  critical  point  of  the  energy.  If 
/°  G  B^{f°°)  for  7  >  0  sufficiently  small  then  f^  -G  f°°. 

Proof.  Choose  e  >  0  so  that  f°°  is  the  only  critical  point  of  the  energy  in  B,,.  By  stability  of 
CSLM  mappings,  we  can  choose  7  >  0  so  that  /°  €  B.y  implies  {/^}  C  B^.  By  compactness 
of  {/^}  and  the  critical  point  property,  any  subsequence  has  a  further  subsequence  that  converges 
to  a  critical  point  of  the  energy  that  lies  in  Be.  As  /°°  is  the  only  such  critical  point,  we  find 
any  subsequence  of  {/*}  has  a  further  subsequence  that  converges  to  f°°,  so  the  whole  sequence 
converges  as  desired.  □ 

Note  that  both  algorithms  satisfy  the  hypothesis  of  theorem  1,  and  therefore  possess  identical  lo¬ 
cal  convergence  properties.  A  slight  modification  of  the  proof  of  theorem  1  yields  the  following 
corollary  that  also  applies  to  both  algorithms. 

Corollary  1.  Let  G  Sq~^  be  arbitrary,  and  define  f^~^^  G  A{f^)  according  to  either  algorithm. 
If  any  accumulation  point  f*  of  the  sequence  {/^}  is  both  an  isolated  critical  point  of  the  energy 
and  a  local  minimum,  then  the  whole  sequence  f^  -G  f*. 

4  Global  Convergence  for 

To  this  point  the  convergence  properties  of  both  algorithms  appear  identical.  However,  we  have 
yet  to  take  full  advantage  of  the  superior  mathematical  structure  afforded  by  the  SD  algorithm. 
In  particular,  from  lemma  4  we  know  that  —  /^||2  — >  0  without  any  further  assumptions 

regarding  the  initialization  of  the  algorithm  or  the  energy  landscape.  This  fact  combines  with  the 
fact  that  lemma  1  also  holds  globally  for  /  G  Sq~^  to  yield  theorem  2.  Once  again,  we  arrive  at  this 
conclusion  by  adapting  the  proof  from  [8]. 

Theorem  2  (Convergence  of  the  SD  Algorithm).  Take  /°  G  Sq~^  and  fix  a  constant  c  >  0.  Let 
{/^}  denote  any  sequence  satisfying  G  Asiyif^).  Then 

1.  Any  accumulation  point  f*  of  the  sequence  is  a  critical  point  of  the  energy. 

2.  Either  the  sequence  converges,  or  the  set  of  accumulation  points  form  a  continuum  in  Sq~^. 


Proof.  (1.)  The  proof  is  inspired  by  [8].  Let  — >  /*  denote  a  convergent  subsequence.  As 

{yfci-i-i}  (-  Sq~^,  we  may  assume  (after  extracting  a  further  subsequence  if  necessary)  that  there 
exits  /'  G  so  that,  as  i  — )■  00, 


r  ^  r 

(13) 

fk,+i  ^ 

(14) 

However,  because  of  (10)  we  have 

f'  =  f*  =  lim  G  n{f^). 

(15) 

i—>-oo 


Therefore,  as  /**  — )■  f*  and  PL  is  closed  we  have  f*  G  PL{f*).  By  definition  of  PL{f*),  if  f*  G 
PL{f*)  then  there  exists  y*  G  so  that 

r  =  argmm  |T(rr)  +  g(/*)  "  J*!!' |  . 

Therefore  there  exists  w*  G  dT{f*)  so  that  0  =  cw*  +  E{f*){f*  —  y*).  By  definition  of  (y°(/*) 
there  exists  v*  G  doB{f*)  so  that 

0  =  crc*  +  Eimr  -  if*  +  CV*))  =  c{w*  -  EiDv*)- 

Thus  f*  is  a  critical  point  of  the  energy  according  to  definition  5. 

(2.)  For  any  sequence  generated  by  the  algorithm,  —  /^|  I2  — >  0  according  to  (10).  Moreover, 

they  lie  in  the  bounded  set  5"“^  C  M".  The  hypotheses  of  Theorem  26.1  of  [9]  are  therefore 
satisfied,  giving  the  desired  conclusion.  □ 
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We  might  hope  to  rule  out  the  second  possibility  in  statement  2  by  showing  that  E  can  nevery  have 
an  uncountable  number  of  critical  points.  Unfortunately,  we  can  exhibit  (c.f.  section  5.3)  simple 
examples  to  show  that  a  continuum  of  local  or  global  minima  can  in  fact  happen.  This  degeneracy 
of  a  continuum  of  critical  points  arises  from  a  lack  of  uniqueness  in  the  underlying  combinatorial 
problem.  We  explore  this  aspect  of  convergence  further  in  section  5. 

By  assuming  additional  structure  in  the  energy  landscape  we  can  generalize  the  local  convergence 
result,  theorem  1,  to  yield  global  convergence  of  both  algorithms.  This  is  the  content  of  corollary  2 
for  the  SD  algorithm  and  the  content  of  corollary  3  for  the  IPM  algorithm.  The  hypotheses  required 
for  each  corollary  clearly  demonstrate  the  benefit  of  knowing  apriori  that  1 1  —  /^  1 1 2  0  occurs 

for  the  SD  algorithm.  For  the  IPM  algorithm,  we  can  only  deduce  this  aposteriori  from  the  fact  that 
the  iterates  converge. 

Corollary  2.  Let  /°  €  Sq~^  be  arbitrary  and  define  G  Asuif^)-  If  the  energy  has  only 

countably  many  critical  points  in  then  {/*}  converges. 

Corollary  3.  Let  G  Sq~^  be  arbitrary  and  define  G  Aipyiif^)-  Suppose  all  critical 

points  of  the  energy  are  isolated  in  and  are  either  local  maxima  or  local  minima.  Then  {/^} 

converges. 

Proof.  Let  {/*}  C  denote  any  sequence  satisfying  G  .4ipm(/^)-  Assume  first  that 

0  f  dT{f^)  —  E{f^)dQB{f^)  for  inhnitely  many  k.  Then  there  exists  a  subsequence  with 
the  property  that  E{f^^+'^)  <  E{f^^)  for  all  j.  We  can  extract  a  further  subsequence  (still  denoted 
{/^^})  and  a  point  f*  so  that  p  _  By  the  CP  property  it  follows  that  f*  is  a  critical  point, 

hence  either  a  local  maximum  or  a  local  minimum.  However,  as  E{f'^^)  >  E{f*)  for  all  j  and 
I  I  Jfcj  _  J*  1 12  — >  0  we  conclude  that  f*  cannot  be  a  local  maximum.  Thus,  as  all  critical  points  are 
isolated  we  know  f*  is  actually  a  strict  local  minimum,  so  >  /*  by  corollary  1 . 

Otherwise,  there  exists  K  sufficiently  large  so  that  0  G  dT{f'^)  —  E{f’^)doB{f'^)  for  all  k  >  K. 
But  then  —  0  for  all  k  >  K,  which  implies  that  =  f^  for  all  k  >  K  by  definition  of  the 
iterates,  so  the  algorithm  converges.  □ 

While  at  hrst  glance  corollary  3  provides  hope  that  global  convergence  holds  for  the  IPM  algorithm, 
our  simple  examples  (c.f.  section  5.3)  demonstrate  that  even  benign  graphs  with  well-defined  cuts 
have  critical  points  of  the  energy  that  are  neither  local  maxima  nor  local  minima. 

5  Energy  Landscape  of  the  Cheeger  Functional 

This  section  demonstrates  that  the  continuous  problem  (2)  provides  an  exact  relaxation  of  the  combi¬ 
natorial  problem  (1).  Specifically,  we  provide  an  explicit  formula  that  gives  an  exact  correspondence 
between  the  global  minimizers  of  the  continuous  problem  and  the  global  minimizers  of  the  combi¬ 
natorial  problem.  This  extends  previous  work  [13,  12,  10]  on  the  relationship  between  the  global 
minima  of  (1)  and  (2).  We  also  completely  classihy  the  local  minima  of  the  continuous  problem  by 
introducing  a  notion  of  local  minimum  for  the  combinatorial  problem.  Any  local  minimum  of  the 
combinatorial  problem  then  determines  a  local  minimum  of  the  combinatorial  problem  by  means  of 
an  explicit  formula,  and  vice-versa.  Theorem  4  provides  this  formula,  which  also  gives  a  sharp  con¬ 
dition  for  when  a  global  minimum  of  the  continuous  problem  is  two-valued  (binary),  three-valued 
(trinary),  or  /c -valued  in  the  general  case.  This  provides  an  understanding  the  energy  landscape, 
which  is  essential  due  to  the  lack  of  convexity  present  in  the  continuous  problem.  Most  importantly, 
we  can  classify  the  types  of  local  minima  encountered  and  when  they  form  a  continuum.  This  is 
germane  to  the  global  convergence  results  of  the  previous  sections.  The  proofs  in  this  section  follow 
closely  the  ideas  from  [13,  12]. 

5.1  Local  and  Global  Minima 

We  hrst  introduce  the  two  fundamental  dehnitions  of  this  section.  The  hrst  dehnition  introduces  the 
concept  of  when  a  set  S'  C  U  of  vertices  is  compatible  with  an  increasing  sequence  Si  C  S2  C 
•  •  •  5  Sfc  of  vertex  subsets.  Loosely  speaking,  a  set  S  is  compatible  with  Si  C  S2  C  •  •  •  C  Sfc 
whenever  the  cut  dehned  by  the  pair  (S,  S'^)  neither  intersects  nor  crosses  any  of  the  cuts  {Si,  Sf). 
Dehnition  6  formalizes  this  notion. 
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Definiton  6  (Compatible  Vertex  Set).  A  vertex  set  S  is  compatible  with  an  increasing  sequence 
SiCS2g---CSkifSC  Si,  SkQSor 

•S'!  C  S'2  C  •  •  •  C  S'i  C  S'  C  Si+i  C  •  •  •  C  Sfc  for  some  1  <  i  <  k  —  1, 

The  concept  of  compatible  cuts  then  allows  us  to  introduce  our  notion  of  a  local  minimum  of  the 

combinatorial  problem,  i.e.  definition  7. 

Definiton  7  (Combinatorial  /c -Local  Minima).  An  increasing  collection  of  nontrivial  sets  Si  C 
S2  C  •  •  •  C  Sfc  is  called  a  k-/oca/  minimum  of  the  combinatorial  problem  if  C{Si)  =  C{S2)  = 

■  ■  ■  =  C{Sk)  <  C(<S')  for  all  S  compatible  with  Si  C  S2  C  •  •  •  C  Sk- 

Pursuing  the  previous  analogy,  a  collection  of  cuts  (Si,  Sf),  •  •  •  ,  {Sk,  S^)  forms  a  fc-local  minimum 
of  the  combinatorial  problem  precisely  when  they  do  not  intersect,  have  the  same  energy  and  all  other 
non-intersecting  cuts  (S,  S'^)  have  higher  energy.  The  case  of  a  1-local  minimum  is  paramount.  A  cut 
{Si,  Si)  defines  a  1-local  minimum  if  and  only  if  it  has  lower  energy  than  all  cuts  that  do  not  intersect 
it.  As  a  consequence,  if  a  1-local  minimum  is  not  a  global  minimum  then  the  cut  (Si,  Sf)  necessarily 
intersects  all  of  the  cuts  defined  by  the  global  minimizers.  This  is  a  fundamental  characteristic  of 
local  minima;  they  are  never  “parallel”  to  global  minima. 

For  the  continuous  problem,  combinatorial  fc-local  minima  naturally  correspond  to  vertex  functions 
/  G  M"  that  take  {k  +  1)  distinct  values.  We  therefore  define  the  concept  of  a  (fc  -f  l)-valued  local 
minimum  of  the  continuous  problem. 

Definiton  8  (Continuous  (/c-l-l) -valued  Local  Minima).  We  call  a  vertex  function  f  G  M”  a  (k  -f  1)- 
valued  local  minimum  of  the  continuous  problem  if  f  is  a  local  minimum  of  E  and  if  its  range 
contains  exactly  k  +  1  distinct  values. 

Theorem  3  provides  the  intuitive  picture  connecting  these  two  concepts  of  minima,  and  it  follows  as 
a  corollary  of  the  more  technical  and  explicit  theorem  4. 

Theorem  3.  The  continuous  problem  has  a  {k  +  l)-valued  local  minimum  if  and  only  if  the  combi¬ 
natorial  problem  has  a  k-local  minimum. 

For  example,  if  the  continuous  problem  has  a  trinary  local  minimum  in  the  usual  sense  then  the  com¬ 
binatorial  problem  must  have  a  2-local  minimum  in  the  sense  of  definition  7.  As  the  cuts  (^i,  Sf) 
and  {S2,  Sf)  defining  a  2-local  minimum  do  not  intersect,  a  2-local  minimum  separates  the  vertices 
of  the  graph  into  three  disjoint  domains.  A  trinary  function  therefore  makes  intuitive  sense.  We 
make  this  intuition  precise  in  theorem  4.  Before  stating  it  we  require  two  further  definitions. 

Definiton  9  (Characteristic  Functions).  Given  %  f  S  CV,  define  its  characteristic  function  fs  as 

fs  =  Cut{S,Sf~^XS  if\S\<n/2  and  fs  =  -Cut{S,Sf~^xs-  if\S\>n/2. 

(16) 

Note  that  fs  has  median  zero  and  TV -norm  equal  to  1. 

Definiton  10  (Strict  Convex  Hull).  Given  k  functions  /i,  •  •  •  ,  fk,  their  strict  convex  hull  is  the  set 

sch{fi,  ■■■  ,fk}  =  {6*1/1  H - f  dkfk  :0i>  0  for  l<i  <k  and  9i -\ - \- 0k  =  1}  (17) 

Theorem  4  (Explicit  Correspondence  of  Local  Minima). 

1.  Suppose  Si  C  S2  ^  •  fz  Sk  is  a  k-local  minimum  of  the  combinatorial  problem  and  let 

f  G  sch{fsi ;  ■  ■  ■  )  fSk}-  Then  any  function  of  the  form  g  =  af  +  j31  defines  a  {k  -\-  f)- 

valued  local  minimum  of  the  continuous  problem  and  with  E{g)  =  C{Si). 

2.  Suppose  that  f  is  a  {k  -\-  l)-valued  local  minimum  and  let  ci  >  C2  >■■■  >  Ck+i  denote 

its  range.  For  1  <  i  <  k  set  Qi  =  {/  =  cf.  Then  the  increasing  collection  of  sets 

Si  G  ■  ■  ■  C  Sk  given  by 

Si  =  fli,  S2  =  Hi  U  H2  •  •  •  Sk  =  Hi  U  •  •  •  U  H^, 
is  a  k-local  minimum  of  the  combinatorial  problem  with  C{Si)  =  E{f). 

Remark  1  (Isolated  vs  Continuum  of  Local  Minima).  If  a  set  Si  is  a  l-local  min  then  the  strict 
convext  hull  (17)  of  its  characteristic  function  reduces  to  the  single  binary  function  fs^.  Thus  every 
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1-local  minimum  generates  exactly  one  local  minimum  of  the  continuous  problem  in  Sq~^,  and  this 
local  minimum  is  binary.  On  the  other  hand,  if  k  >  2  then  every  k-local  minimum  of  the  combi¬ 
natorial  problem  generates  a  continuum  (in  Sq~^)  of  non-binary  local  minima  of  the  continuous 
problem.  Ai  a  consequence,  the  hypotheses  of  theorem  1,  corollary  2  or  corollary  3  can  hold  only  if 
no  such  higher  order  k-local  minima  exist.  When  these  theorems  do  apply  the  algorithms  therefore 
converge  to  a  binary  function. 

As  a  final  consequence,  we  summarize  the  fact  that  theorem  4  implies  that  the  continuous  relaxation 
of  the  Cheeger  cut  problem  is  exact.  In  other  words, 

Theorem  5.  Given  {/  G  argminii^}  there  exists  an  explicit  formula  to  construct  the  set  {S'  € 
arg  minC},  and  vice-versa. 


5.2  Proofs  of  Lemmas  and  Theorems 

The  proof  closely  follows  the  arguments  from  [13].  Define  the  median  of  /  €  M"  as 

med(/)  =  min{c  €  range(/)  satisfying  |{/  <  c}|  >  n/2}  (18) 

By  this  definition,  the  median  of  /  is  the  n/2  smallest  entry  when  n  is  even. 

We  define  the  TV-sphere  X  by: 

X  =  {/  e  M”  :  T{f)  =  1  and  med(/)  =  0}. 

Definiton  11  (Local  Minima  on  the  TV-sphere).  f  G  X  is  a  local  minimum  on  the  TV-sphere  if 

there  exists  e  >  0  such  that  E(f)  <  E{g)  for  all  g  G  X  satisfying  \\g  —  /II2  <  e- 

The  following  lemma  states  that  it  is  enough  to  consider  local  minima  of  E  on  the  TV-sphere. 
Lemma  11.  A  non  constant  function  f  G  M"  is  a  local  minimum  of  E  in  the  usual  sense  if  and  only 
if  f  =  (/  —  med(/))/T(/  —  med(/))  is  a  local  minimum  of  E  on  the  TV-sphere. 


Proof.  Suppose  that 


/  =  Proj3e(/)  = 


/  -  med(/) 


(19) 

Then  there  exists 


T{f  -  med(/)) 

is  a  local  minimum  of  i?  on  X  but  /  is  not  a  local  minimum  of  E  on 
fn^f  with  E{fn)  <  E{f).  By  continuity  of  Proj^^,  fn^f  and  since  E  is  invariant  under  Proj^g, 
we  have  E{fn)  <  E{f)  which  is  a  contradiction.  Suppose  now  that  /  is  a  local  min  of  E  in 
but  /  is  not  a  local  min  of  E  on  X.  Then  there  exists  fn  ^  f  with  E{fn)  <  E{f).  Since  there 
exists  a  f  0  and  /3  such  that  /  =  af  -f  f31  it  is  clear  that  afn  -f  /31  — >  /  and  E{afn  +  (31)  <  E{  f) 
which  is  a  contradiction.  □ 


Recall  that  a  polyedron  is  a  set  defined  by  a  finite  number  of  linear  equalities  and  inequalities,  and 
that  it  is  necessarily  convex.  Given  a  permutation  cr  €  ©„  the  polyedron 

Va  =  {f  G  :  fa(l)  >  fa(2)  >  '  '  '  >  fa{n)}- 

represents  one  possible  ordering  of  the  function  /  G  M".  We  then  define  the  face  of  the  TV- 
sphere  by 

[?.  =  {/  e  K"  :/  e  P.,  WfWrv  =  1  and  med(/)  =  0}. 

As  the  median  and  the  total  variation  are  linear  functions  on  Va,  we  have  simply  added  two  linear 
constraints  so  that  is  also  a  polyhedron  .  Obviously  we  have 

X  =  Uada 

where  the  union  is  taken  over  all  possible  permutations.  Using  the  same  arguments  from  [13,  Lemma 
2.1]  yields: 

Lemma  12  ([13]).  Suppose  /  G  Then  f  is  a  binary  function  if  and  only  if  f  is  an  extreme  point 
ofda- 

The  next  lemma  then  gives  an  explicit  description  of  the  face 
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Lemma  13.  is  the  n  —  2  dimensional  simplex 


da=ch{fs„fs,---  (20) 

Here,  ch{fs^,fs2'''  denotes  the  convex  hull  of  the  characteristic  functions  fs^  of  the 

increasing  sequence  of  sets  S'!  C  S'2  C  •  •  •  C  Sn-i  defined  by  Si  =  ■  ■  ■  ,  Xcr(i)},  1  <  i  < 

n  —  1.  Moreover  the  functions  fsi ,  fs2  ’ ' '  )  fs„-i  linearly  independent. 


Proof  The  fact  that  fg^,  fg.^,  •  •  •  ,  are  linearly  independent  (and  therefore  affinely  indepen¬ 

dent)  can  be  directly  read  from  definition  (16)  of  /s.  Also  from  this  same  definition  it  is  clear  that 
fsi )  fs2 )  ■  ■  ■  )  fSn-i  binary  functions  that  belongs  to  Jv,  and  that  these  are  the  only  such  binary 
functions.  The  conclusion  then  comes  from  the  fact  that  a  compact  convex  set  is  the  convex  hull  of 
its  extreme  points.  □ 

Proposition  1  (Decomposition  in  Binary  Functions).  Let  f  G  Xbe  a  function  whose  range  contains 
exactly  k  distinct  values.  Then  there  exists  a  unique  increasing  collection  of  nontrivial  sets  Si  C 
S'2  C  •  •  •  C  S'fc  and  a  unique  vector  6  =  {9i,  ■  ■  ■  ,9k)':^0,9-l  =  l,so  that 

k 

f^Y.^^fs,■  (21) 

i=l 

We  will  refer  to  (21)  as  the  decomposition  of  /  in  binary  functions. 


Proof.  Since  f  G  ^a-  for  some  permutation  a  the  existence  of  such  a  decomposition  is  clear  from 
Lemma  13.  Suppose  now  that  (21)  is  such  a  decomposition.  To  show  that  the  decomposition  is 
unique,  let  io  be  such  that  <  n/2  and  |5'io+i|  >  n/2.  Also  let  ai  =  l/Cut(S'i,  Sf)  >  0.  Then 
combining  (16)  and  (21)  we  find  that 


ai9i  +  q;2^2  +  +  •  •  •  + 

+  •  •  •  + 

+  •  •  •  +  Ctio^io 


if  a;  e  S'! 
if  a;  e  S2\Si 
if  a;  €  Sf\S2 


fix)  =  < 


CXio^io 

0 


+  +  l  -  aig+2di0+2 


if  X  €  SiQ\SiQ  —  i 
if  X  G  tS2Q-|_i\»5'2Q 

if  a;  G  S'ig+2\5'jo+i 
if  X  G  SiQg-ii\SiQg-2 


y  ^io  +  1^^0  +  1  Q^io+2^io+2  ‘  ‘  ‘  if  X  € 


Therefore  /  takes  its  greatest  value  on  Si,  its  second  greatest  value  on  S2\Si,  etc.  As  a  consequence 
the  sets  Si  are  uniquely  determined  by  /,  and  since  the  fg.  are  linearly  independent  there  is  a  unique 
possible  choice  for  the  9i.  □ 


As  a  direct  corollary  of  the  previous  proof,  the  decomposition  of  a  function  /  in  binary  functions 
can  easily  be  recovered. 

Corollary  4.  Suppose  f  G  X  and  range{f)  =  {ci,---  ,Ck}  where  ci  >  C2  >  ■  •  ■  >  Ck.  Let 
f  =  X]i=i  ^ifSi  be  its  unique  decomposition  in  binary  functions.  Then 

i 

Si=[j{f  =  Cj},  i  =  ,k-l. 

3=1 

Lemma  14.  Let  f  G  X  and  let  (^ifSi  be  its  unique  decomposition  in  binary  functions.  Also 
let  S  be  a  nontrivial  set.  Then  f  and  fg  belong  to  a  common  face  of  the  TV-sphere  if  and  only  if  S 
is  compatible  with  Si  C  ■  ■  ■  C  Sk. 
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Proof.  Suppose  S  is  not  compatible  with  with  Si  C  ■  ■  ■  C.  Sk-  Then  there  exists  Si  such  that  S  ^  Si 
and  Si  ^  S.  Then  there  exists  Xin  G  S\Si  and  Xout  G  Si\S.  Since  Xin  G  S  and  Xout  ^  S'  it  is  clear 
from  (16)  that  /s(xin)  >  fsi^oui)-  On  the  other  hand,  since  Xom  G  Si  and  Xin  ^  Si  we  have  by 
definition  of  the  binary  decomposition  that  /(xout)  >  f{xin).  This  follows  as  the  values  that  /  takes 
on  Si  are  greater  or  equal  than  the  values  that  it  takes  outside  of  Si.  Thus  fs  and  /  have  a  different 
ordering  and  therefore  cannot  belong  to  a  common  face  of  the  TV-sphere. 

Similarly,  if  /  and  fs  have  a  different  ordering  then  there  exist  two  points  Xin  and  Xout  such  that 
fs{xin)  >  fs{xout)  and  /(xin)  <  /(xout)-  Clearly  Xin  G  S  and  Xout  ^  S.  On  the  other  hand  there 
must  exist  an  Si  such  that  Xin  ^  Si  and  Xout  G  Si.  This  implies  that  S  ^  Si  and  Si  ^  S.  Therefore 
S  is  not  compatible  with  Si  C  •  •  •  C  S^.  □ 

We  are  now  ready  to  prove  theorem  4. 

Proof  of  Theorem  4.  Given  a  function  /  G  X,  define  its  binary  neighbors  on  the  TV-sphere  by 

A/bin(/)  =  {5  G  X  :  (/  is  binary  and  /  and  g  belong  to  a  common  face  of  the  TV-sphere}. 

A  function  /  G  X  is  a  local  minimum  of  E  on  the  TV-sphere  if  and  only  if  /  is  a  local  max  of  the 
£^-norm  on  the  TV-sphere.  As  we  restrict  to  functions  with  zero  median,  the  £^-norm  is  a  linear 
function  on  each  face  Therefore  a  function  /  is  a  local  maximum  of  the  ^^-norm  if  and  only  if 

ll/lli  >  llfflli  forallg  G  A/'bin(/).  (22) 

Indeed,  if  /  has  a  binary  neighbor  g  with  strictly  greater  norm  then  any  function  of  the  form 
Of  +  {1  —  0)g,  6  G  (0, 1)  has  strictly  greater  f^-norm  than  /.  Therefore  /  is  not  a  local  maximum. 
On  the  other  hand  assume  that  (22)  holds  and  let  be  a  face  to  which  /  belongs.  Then  all  the 
extreme  points  of  g^-  belong  to  A/bin (/)  and  therefore  /  has  £^-norm  greater  than  or  equal  to  that  of 
the  extreme  points.  Therefore  /  has  £^-norm  greater  or  equal  than  all  the  functions  in  g^-.  As  the 
face  go-  to  which  /  belonged  was  arbitrary,  /  must  be  a  local  maximum. 

To  prove  the  first  statement  of  the  theorem,  suppose  S'!  C  5*2  C  •  •  •  C  is  a  /c -local  minimum  of 
the  combinatorial  problem  and  that  /  G  sch{  fs^,  ■  ■  ■  ,  fsk}-  That  is,  /  =  where  each 

0i  >  0  and  sum  to  1.  Using  Lemma  14  we  see  that 

M,in(/)  =  {fs  ■  S  is  compatible  with  Si  C  ■  ■  ■  C  S^}.  (23) 

As  S'!  C  •  •  •  C  S'fc  is  a  combinatorial  fc-local  minimum  by  assumption,  inequality  (22)  holds  and  / 
is  a  local  minimum  of  the  energy  on  the  TV-sphere. 

To  prove  the  second  statement  of  the  theorem,  suppose  that  /  is  a  local  minimum  and  let  /  = 
X]i=i  ^ifSi  be  its  decomposition  in  binary  functions.  As  the  functions  fs^^  ,■■■  ,  fSk  belong  to 
the  same  face  of  the  TV-sphere,  we  must  have  E{f)  =  E{fs^)  =  ■  ■  ■  =  E{fsf}.  This,  in  turn, 
implies  C(5'i)  =  •  •  •  =  C{Sk).  The  binary  neighbors  of  /  are  again  defined  by  (23)  and  therefore, 
because  of  (22),  we  must  have  E{f)  <  E{fs)  for  all  S  compatible  with  C  •  •  •  C  Sk.  This 
implies  that  S'!  C  S'2  C  •  •  •  C  is  a  /c-local  minimum  of  the  combinatorial  problem.  □ 

5.3  Critical  points 

To  conclude,  we  provide  a  few  simple  examples  that  illustrate  the  previous  theorems  and  demon¬ 
strate  the  distinction  between  local  minima  and  critical  points  (definition  5).  Consider  first  the  graph 
on  three  vertices  V  =  {xi,X2,X3}  with  symmetric  edge  weights  (wi2,  tuia,  tU23)  =  (1,2,2),  i.e. 
an  isoceles  triangle  (see  figure  5.3  (a)).  To  see  that  a  continuum  in  Sq  of  global  minima  may  occur, 
define  Pq  =  (a,a  —  1,0)  for  a  G  [0, 1].  Then  med(pQ)  =  0,  ||Pa||i  =  1  and  T{pa)  =  3  for  all 
a  G  [0, 1].  Thus, 

E(pa)  =3  =  min  E(f). 

If  we  then  set  fa  =  Pa/||pQ||2  G  Sq,  we  have  that  E{fa)  =  3  for  all  a  G  [0, 1].  As  each  fa  attains 
the  global  minimum  of  Li  on  5q  ,  it  follows  that  0  G  dT{fa)  —  E{fa)dB{fa)  for  each  a  G  [0, 1]  as 
well.  We  therefore  have  a  continuum  of  critical  points  that  are  also  global  minima.  This  corresponds 
to  the  fact  that  the  sets  Si  =  {xi}  and  S2  =  {xi,  X3}  define  a  2-local  minimum  of  the  combinatorial 
problem  according  to  definition  7. 
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1 


1/  \2 


(a)  Isoceles  Triangle 


Figure  1 :  Small  Graph  Examples 


We  next  examine  the  graph  on  six  vertices  V  =  {xi,  a;2,  ^3, 0:4,  xs,  xg}  that  has  symmetric  edge 
weights  with  non-zero  entries  W12  =  wia  =  W23  =  W34  =  tf45  =  ^46  =  tt'se  =  1-  We  call 
this  graph  the  bowtie  (see  figure  5.3  (b)).  Consider  the  cut  defined  by  the  binary  function  /  = 
(1, 1, 0, 0, 0,  0)^  that  has  energy  i?(/)  =  1.  According  to  definition  5,  /  defines  a  critical  point  of  the 
energy  if  there  exist  w  €  dT{f)  and  v  €  doB{f)  so  that  w  =  u.  By  taking  v  =  (1, 1,  —1,  —1, 0,  0) 
and  computing  the  subdifferential  of  T  explicitly,  we  see  this  occurs  if  there  exist  Sij  G  [—1,1] 
satisfying 


/  S12  + 1  \ 

-S12  + 1 

1 

—2  -|-  S34 

-1 

—  S34  +  S45  +  S46 

-1 

—  S45  +  S46 

0 

\  — S46  —  S56  / 

\0j 

This  requires  S12  =  0  and  S34  =  1,  which  then  yields  a  convenient  choice  S45  =  545  =  0.  Thus  /  de¬ 
fines  a  critical  point  of  the  energy.  However,  direct  computation  shows  that  :=  (1,  9,  0, 0, 0, 0)^ 
has  strictly  greater  energy  for  any  0  <  6*  <  1  and  /”  :=  (1, 1,  k,  0, 0, 0)^  has  strictly  lesser  energy 
for  any  0  <  k  <  1.  Thus  we  have  a  critical  point  that  is  neither  a  local  maximum  nor  a  local 
minimum.  In  particular,  corollary  3  does  not  apply  even  for  this  simple  example. 


6  Experiments 

In  all  experiments,  we  take  the  constant  c  =  1  in  the  SD  algorithm.  We  use  the  method  from 
[3]  to  solve  the  minimization  problem  in  the  SD  algorithm  and  the  method  from  [7]  to  solve  the 
minimization  problem  in  the  IPM  algorithm.  We  terminate  each  minimization  when  either  a  stopping 
tolerance  of  £  =  (i.e.  —  u^\\i  <  e)  or  2,000  iterations  is  reached.  All  experiments 

that  follow  use  a  symmetric  /c-nearest  neighbor  graph  combined  with  the  weight  similarity  function 
Wij  =  exp(— Here,  nj  =  ||a;i  —  a;j||2  and  the  scale  parameter  =  3fi|,  where  dk 
denotes  the  mean  distance  of  the  nearest  neighbor.  We  use  the  two-moon,  MNIST,  USPS  and 
COIL  datasets.  The  two-moon  dataset  [2]  uses  the  same  setting  as  in  [13].  We  take  k  =  5  nearest 
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neighbors  to  construct  the  graph.  We  preprocessed  the  MNIST,  USPS  and  COIL  data  by  projecting 
onto  the  hrst  50  principal  components,  and  take  fc  =  10  nearest  neighbors  for  the  MNIST  and  USPS 
datasets  and  k  =  5  nearest  neighbors  for  the  COIL  dataset.  The  first  set  of  experiments  considers 
the  two-moon  dataset  and  pairs  of  image  digits  extracted  from  the  MNIST  dataset.  The  hrst  table 
summarizes  the  results  of  these  tests.  It  shows  the  mean  Cheeger  energy  value  (2),  the  mean  error 
of  classihcation  (%  of  misclassihed  data)  and  the  mean  computational  time  for  both  algorithms  over 
10  experiments  with  the  same  random  initialization  for  both  algorithms  in  each  of  the  individual 
experiments. 


SD  Algorithm 

Modihed  IPM  Algorithm  [7]  | 

Energy 

Error  (%) 

Time  (sec.) 

Energy 

Error  (%) 

Time  (sec.) 

2  moons 

0.126 

8.69 

2.06 

0.145 

14.12 

1.98 

4’s  and  9’s 

0.115 

1.65 

52.4 

0.185 

20.53 

57.3 

3’s  and  8’s 

0.086 

1.217 

49.2 

0.086 

1.219 

48.1 

Our  second  set  of  experiments  applies  both  algorithms  to  multi-class  clustering  problems  using  a 
standard,  recursive  bi-partitioning  method.  The  table  below  presents  the  mean  Cheeger  energy, 
classihcation  error  and  time  over  10  experiments  as  before. 


SD  Algorithm 

Modihed  IPM  Algorithm  [7]  | 

Energy 

Err.  (%) 

Time  (min.) 

Energy 

Err.  (%) 

Time  (min.) 

MNIST  (10  classes) 

1.30 

11.78 

45.01 

1.29 

11.75 

42.83 

USPS  (10  classes) 

2.37 

4.11 

5.15 

2.37 

4.13 

4.81 

COIL  (20  classes) 

0.19 

1.58 

4.31 

0.18 

2.52 

4.20 

Overall,  the  results  show  that  both  algorithms  perform  equivalently  for  both  two-class  and  multi¬ 
class  clustering  problems.  As  our  interest  here  lies  in  the  theoretical  properties  of  both  algorithms, 
we  will  study  practical  implementation  details  for  the  SD  algorithm  in  future  work.  For  instance,  as 
Hein  and  Biihler  remark  [6],  solving  the  minimization  problem  for  the  IPM  algorithm  precisely  is 
unnecessary.  Analogously  for  the  SD  Algorithm,  we  only  need  to  lower  the  energy  sufficiently  be¬ 
fore  proceeding  to  the  next  iteration  of  the  algorithm.  It  proves  convenient  to  stop  the  minimization 
when  a  weaker  form  of  the  energy  inequality  (6)  holds,  such  as 


E{f)>E{h)+0 


(Eif)\\h-m\ 

\B{h)  c  ) 


for  some  constant  0  <  6*  <  1.  This  condition  provably  holds  in  a  finite  number  of  iterations  and 
still  guarantees  that  —  /^Ib  — >  0.  The  concrete  decay  estimate  provided  by  SD  algorithm 

therefore  allows  us  to  give  precise  meaning  to  “sufficiently  lowers  the  energy.”  We  investigate  these 
aspects  of  the  algorithm  and  prove  convergence  for  this  practical  implementation  in  future  work. 


Reproducible  research:  The  code  is  available  at  http://www.cs.cityu.edu.hk/~xbresson/codes.html 
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A  Closedness  of  V.'' 

Dehne  the  annulus 

Ko  =  {u  G  :  1  <  \\u\\2  <  1  +  cV^}  (24) 

along  with  the  set-valued  map  =4  Kq 

y^if)  ■■=f  +  cdoB{f). 

That  the  range  of  y‘^  lies  in  Kq  follows  from  (9). 

Lemma  15.  The  set-valued  map  is  closed. 

Proof.  We  hrst  show  the  set-valued  map  d^B  :  5q  Kq  is  closed.  To  this  end,  given  any 

with  f\rGS^-^ 
z'^  e  dnB{f)  with  z'^  -G  z*, 
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(25) 

(26) 


we  must  to  show  that  z*  €  doB{f*).  As  B{g)  >  B{f’^)  +  —  /^)  for  all  g  €  M”  by 

definition,  by  continuity  of  B  on  Sq~^  we  have  B[g)  >  B{f*)  +  {z* ,  g  —  f*)  as  well.  Moreover, 
{z*,  1)  =  lim(z^,  1)  =  0  and  z*  G  doB{f*)  as  desired.  To  show  that  is  closed,  assume 

with  (27) 

g  y-(^f)  =f  +  cz'^  ^  g*  (28) 

for  some  z^  G  doB{f^).  As  {z^}  lies  in  a  compact  set  and  OqB  is  closed,  there  exists  a  subsequence 
with  — >•  f*  and  z^‘  — >■  z*  G  d[)B{f*).  Therefore 

=lim5'=-  =r +CZ*  Gr(r) 

by  the  definition  of  as  desired.  □ 


Define  the  function  :  5q  ^  x  Kq  ^  by 

=  argmin|r(M)  + 

u  2c 

Lemma  16.  The  function  is  continuous  on  Sq~^  x  Ko. 


Proof.  Let  h  =  ^‘^{f,g)  and  h'  =  (/').  Then  we  have  E{f)^^-^  G  —dT{h)  and 

E{f  G -dT{h')  so 

T{h')  >  T{h)  -  (^E{f)^,h'  -  h'^ 

T{h)  >  T{h')  -  (E{f  )^^^^,h-  . 

By  adding  these  two  inequalities, 

{E{f){h  -g)-  Eiffh'  -  g'),  h  -  h')  <  0. 

Adding  and  subtracting  we  get 


iE{f){h  -g)-  E{f){h'  -  g'),  h  -  h')  +  ({E{f)  -  E{f')){h'  -  g'),  h-h')<Q 


E{f)(^{h  -  h')  -{g-  g'),h  -  h')  +  {E{f)  -  E{r))(h'  -g\h-  h'] 
E{f)  {\\h  -  h'Wl  -(g-g\h-  h'))  +  {E{f)  -  E{f))lh'  -g',h- 


\\h-h'\\l  <  {g  -  g\h  -  h'^  - 


(Ejf)  -  Ejf)) 
Eif) 


{h'-g',h-h') 


)  <0 
h')  <  0 


From  Cauchy-Schwarz  we  have 


\\h'-hh<y-gh 


\E{n-E{f)\ 

Eif) 


\\h'-g'h<y-9h  + 


\Ein-Eif)\ 

Eif) 


nfw 


The  last  inequality  follows  from  (11).  We  then  easily  conclude  that  if  (/',  g')  — >  (/,  g)  then  h'  — >  h, 
due  to  the  continuity  of  i?  on  □ 


Next,  define  the  set- valued  map  PL  :  Sq  ^  =:i  K” 

Note  this  definition  coincides  with  the  definition  in  lemma  1 . 
Lemma  17.  The  set-valued  map  PL  is  closed. 
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Proof.  Suppose  that 


(29) 

(30) 


fk  j* 

e  H(/'=)  =  ^"(/^  ^  h*. 

We  must  show  that  h*  G  %{/*).  Clearly  there  exist  (/*  G  such  that 

As  the  sequence  is  in  the  compact  set  there  exists  g*  G  Kq  and  a  subsequence  — )■  g* . 

Consequently 

/'=•  ^  r  (31) 

g'^' (32) 

from  which  we  may  conclude  g*  G  y^if*)  because  is  closed.  Now  since  '1'°  is  continuous  we 

have 

j^k^  =  ^  ^s-{r,g*)  G  vi/^(r,  j^(r))  =  H(r). 

But  h’^'  — >  h*,so  we  may  conclude  h*  G  %{/*)  as  desired.  □ 
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